Multiple sequence alignment with user-defined constraints @ GOBICS
نویسندگان
چکیده
Summary: Most multi-alignment methods are fully automated, i.e. they are based on a fixed set of mathematical rules. For various reasons, such methods may fail to produce biologically meaningful alignments. Herein, we describe a semi-automatic approach to multiple sequence alignment where biological expert knowledge can be used to influence the alignment procedure. The user can specify parts of the sequences that are biologically related to each other; our software program uses these sites as anchor points and creates a multiple alignment respecting these user-defined constraints. By using known functionally, structurally or evolutionarily related positions of the input sequences as anchor points, our method can produce alignments that reflect the true biological relations among the input sequences more accurately than fully automated procedures can do. Availability: Our software is online available at GÖttingen BIoinformatics Compute Server (GOBICS), http://dialign.gobics.de/anchor/index.php Contact: [email protected] A large number of multi-alignment programs have been developed during the last twenty years, see [10, 14] for recent reviews; the performance of these tools has been studied extensively [3, 11]. Practically all state-of-the-art alignment methods are fully automated. They construct alignments following a fixed set of algorithmical rules where only a limited number of parameters can be adjusted by the user. Automatic alignment methods are clearly necessary in situations where no expert knowledge about the input sequences is available or if large amounts of data are to be processed. However, if a researcher is already familiar with a specific sequence family under study, he or she may know certain regions in the sequences that are functionally or phylogenetically related and should therefore be aligned to each other. Here, it is useful to have an alignment method that can incorporate such user-defined homology information and then creates an alignment respecting these constraints. Multiple alignment under constraints has been proposed by Myers et al. [9] and, more recently, by Sammeth et al. [12] and Brown and Hudek [1]. The multi-alignment program DIALIGN [5, 6] has an option to calculate alignments under pre-defined constraints. Initially, this program feature has been implemented to reduce the alignment search space and program running time for large genomic sequences [2, 8, 13]. However, userdefined constraints – or anchor points, as we call them – can also be used to improve the biological quality of multiple alignments. To this end, known homologies can be specified by the user. A semi-automatic alignment procedure is then carried out where the user-specified homologies are aligned wherever possible; the remainder of the sequences is then automatically aligned by DIALIGN according to these user-defined constraints. A detailed description of this algorithm is given in [7]; this paper also describes applications of our approach to genomic sequences around the Hox gene cluster. To make our anchored multi-alignment tool easily available to the research community, we developed a WWW interface at GOBICS (GÖttingen BIoinformatics Compute Server). The user can specify an arbitrary number of anchor points that are taken into account for the alignment. Each of these anchor point corresponds to a pair of equal-length segments of two of the input sequences, see Figure 1. An anchor point is therefore characterized by five coordinates: the two sequences involved, the starting positions in the respective sequences and the length of the anchored segments. As a sixth parameter, our method requires a score that determines the priority of anchor points. The latter parameter is necessary, since it is in general not possible to use all proposed anchors simultaneously, so the algorithm may need to select a suitable subset of them. Here, our method uses the same greedy procedure that is used in the original DIALIGN approach to select consistent sets of local pairwise alignments for multiple alignment [4]. Our anchoring procedure works as follows: if a position in one of the input sequences is assigned to a position in another input sequence through one of the selected anchor points, this does not necessarily 1 Bioinfor matics © Oxford University Press 2004; all rights reserved. Bioinformatics Advance Access published November 16, 2004
منابع مشابه
Multiple sequence alignment with user-defined constraints at GOBICS
Most multi-alignment methods are fully automated, i.e. they are based on a fixed set of mathematical rules. For various reasons, such methods may fail to produce biologically meaningful alignments. Herein, we describe a semi-automatic approach to multiple sequence alignment where biological expert knowledge can be used to influence the alignment procedure. The user can specify parts of the sequ...
متن کاملDIALIGN-TX and multiple protein alignment using secondary structure information at GOBICS
We introduce web interfaces for two recent extensions of the multiple-alignment program DIALIGN. DIALIGN-TX combines the greedy heuristic previously used in DIALIGN with a more traditional 'progressive' approach for improved performance on locally and globally related sequence sets. In addition, we offer a version of DIALIGN that uses predicted protein secondary structures together with primary...
متن کاملDIALIGN at GOBICS—multiple sequence alignment using various sources of external information
DIALIGN is an established tool for multiple sequence alignment that is particularly useful to detect local homologies in sequences with low overall similarity. In recent years, various versions of the program have been developed, some of which are fully automated, whereas others are able to accept user-specified external information. In this article, we review some versions of the program that ...
متن کاملMultiple alignment of genomic sequences using CHAOS, DIALIGN and ABC
Comparative analysis of genomic sequences is a powerful approach to discover functional sites in these sequences. Herein, we present a WWW-based software system for multiple alignment of genomic sequences. We use the local alignment tool CHAOS to rapidly identify chains of pairwise similarities. These similarities are used as anchor points to speed up the DIALIGN multiple-alignment program. Fin...
متن کاملMultiple sequence alignment with user-defined constraints
In many situations, automated multi-alignment programs are not able to correctly align families of nucleic acid or protein sequences. Difficult cases comprise not only distantly related sequences but also tandem duplications independent of their evolutionary age. Frequently, additional biological information is available that establishes homologies at least in parts of the sequences based on st...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004